Deep learning fundamentals and reinforcement learning, with an interactive Q-learning maze that demonstrates the agent–environment loop end to end.
Multi-layer neural networks trained end-to-end with backpropagation.
hk+1 = σ(Wk hk + bk). Universal approximator; no built-in spatial or temporal structure.softmax(QKT/√d) V replaces recurrence; the basis of modern LLMs and ViTs.Gradients propagate via the chain rule through the computation graph:
∂L/∂θ = (∂L/∂z) · (∂z/∂θ)
Modern frameworks (PyTorch, JAX) build the graph dynamically and apply reverse-mode autodiff. Training stability is then a matter of initialisation, normalisation (BatchNorm, LayerNorm), and residual connections.